User Behavior research & A/A/B Test for online food product store
I work at a startup that sells food products.
I need to investigate user behavior for the company's app.
Based on the tests we will decide whether the proposed change by the developers yields positive results
regarding the conversion ratio at each event in the process.
How?
User Behavior-
Tests-
import pandas as pd
import numpy as np
import datetime
from datetime import datetime
from scipy import stats
import math
import warnings
import plotly.express as px
import plotly.graph_objects as go
df = pd.read_csv('/datasets/logs_exp_us.csv',sep='\t')
df.info()
df.head(3)
Checking zeros values-
def zeros(data):
for i in data.columns:
if len(data[data[i]==0]) == 0:
print(i , len(data[data[i]==0]))
else:
print(i,len(data[data[i]==0]),(round(len(data[data[i]==0])/len(data[i]),3)))
zeros(df)
df.describe(include = 'all')
print('There is',df.EventName.nunique(),'unique events:\n',df.EventName.unique())
df.columns = ['event_name','user_id','timestamp','experiment_id']
df.head(2)
df['timestamp'] = df['timestamp'].apply(lambda x:datetime.fromtimestamp(x))
df['date'] = pd.to_datetime(df['timestamp'],format = '%Y:%M:%D').dt.date
df['time'] = pd.to_datetime(df['timestamp']).dt.time
df.head(2)
df.timestamp.describe()
The first date documented is: 2019-07-25 and the last date described is: 2019-08-07.
In total there are 13 days in the logs.
df.duplicated().sum()
for i in df[df.duplicated()].columns:
print(i,':',df[df.duplicated()][i].nunique())
df.drop_duplicates(inplace=True)
df.info()
In this section we will get to know the data in more depth.
We will calculate some statistics, we will filter the relevant time period for the test,
and we will examine whether and how much filtering it affected the data and the experimental groups.
print('There are {} unique events in the logs , and {:,} events overall.'.format(df.event_name.nunique(),df.shape[0]))
print('There are {:,} unique users in the logs.'.format(df.user_id.nunique()))
print('The average number of events per user is: {:.2f}.'.format(df.groupby('user_id')['event_name'].count().mean()))
We will examine whether all information in the logs can be processed in terms of the frequency of events relative to the date-
date_max = df.date.max()
date_min = df.date.min()
delta = date_max-date_min
print('The earliest date that appears in Logs is: {}, and the latest date is: {}.'.format(date_min,date_max))
print('The period of time does the data cover is {} days.'.format(delta.days))
fig = px.histogram(df, x="timestamp", opacity = 0.7)
fig.update_layout(
title_text='Events by Date & Time Histogram',
xaxis_title_text='Events',
yaxis_title_text='Frequancy')
fig.update_xaxes(
dtick="d1",
tickformat="%d %B\n%Y")
fig.update_traces(marker_color='cadetblue')
fig.show()
From the histogram it can be seen quite clearly that the moment when the data begins to be complete is the first of August.
In order for the overall picture not to be skewed, it is necessary to ignore the data recorded before this date.
The data actually represent the dates 2019-08-01 to 2019-08-07.
In the previous section we found that the period that can be processed starts from the first of August until the end of the logs.
We will now filter the data and examine how much information we lost in terms of - users and events.
n_df =df.query('timestamp > "2019-07-31 23:59:59"')
n_df.head(2)
Now, we check how many users we lose when filtering the older data:
filt_user_id = df.query('timestamp <= "2019-08-01 00:00:00"')['user_id']
unq_filt_user_id = filt_user_id.nunique()-n_df.loc[n_df['user_id'].isin(filt_user_id)].user_id.nunique()
print('We lost {} unique users, there share from all unique users is: {:.4%}.'.format(
unq_filt_user_id,unq_filt_user_id/df.user_id.nunique()))
Now, We'll check how much events we lost:
filt_event_id = df.query('timestamp <= "2019-08-01 00:00:00"')['timestamp'].nunique()
print('We lost {} unique events, there share from total events is: {:.4%}'.format(filt_event_id, filt_event_id/df.timestamp.nunique()))
Now, We'll check how much events we lost in terms of event type:
#share of event types we lose:
event_type = df.event_name.value_counts().reset_index()
filt_event_type = df.query('timestamp <= "2019-08-01 00:00:00"').event_name.value_counts().reset_index()
filt_event_type['share'] = round(filt_event_type.event_name/event_type.event_name,4)
filt_event_type.columns = ['event_name','number','share']
print('The numbers of events by type and there share that we lost is: \n\n{}'.format(filt_event_type))
We lose in total 1.47 % of events while expluding the data before Aug 1st.
In term of event type:
We lose 1.5% of main screen events, 1% of the offer screen, 0.8% of cart screen, 0.6% of Payment Screen and 1.2% of tutorial events.
Now, We'll check how much events we lost in terms of groups (246,247,248):
g_befor = df.experiment_id.value_counts()
g_after = n_df.experiment_id.value_counts()
print('Number of events before filtering-')
display(g_befor)
print('Number of events after filtering-')
display(g_after)
print('The share by group that remain after the filtering is:\n\n{}'.format(g_after/g_befor))
We do not appear to have lost much of the events in terms of the experimental groups (less than 2% per group).
For calculating proportions we need to know the number of users in the group.
Let's check number of users in every group before and after filtering and the ratio:
before_g = df.groupby('experiment_id')['user_id'].nunique().reset_index()
after_g = n_df.groupby('experiment_id')['user_id'].nunique().reset_index()
groups = before_g.merge(after_g , how= 'left', on='experiment_id')
groups.columns = ['experiment_id','before','after']
groups['ratio'] = groups.after/groups.before
groups
We do not appear to have lost much of the users in terms of the experimental groups (less than 2% per group).
Logs-
Users-
Events-
After testing and filtering the data, there is no noticeable problem relying on the remaining data, they are the vast majority of the logs.
At this step we will study in depth the behavior of users in terms of conversion rates between events.
We will look at a common sequence of events, the number of users who do this sequence also from the perspective of the experimental groups.
In addition, we will examine the event after which the highest loss of users occurs.
n_df.event_name.value_counts().reset_index()
As expected, the event that repeats the most is "MainScreenAppear" and it constitutes almost 50% of all events
event_users =n_df.groupby('event_name')['user_id'].nunique().reset_index().sort_values(by='event_name',ascending=False)
event_users
once = n_df.groupby(['user_id','event_name'])['timestamp'].count().reset_index().query('timestamp==1')
once = once.groupby('event_name')['user_id'].nunique().reset_index().sort_values(by='event_name', ascending=False)
once
several = n_df.groupby(['user_id','event_name'])['timestamp'].count().reset_index().query('timestamp >1')
several = several.groupby('event_name')['user_id'].nunique().reset_index().sort_values(by='event_name', ascending=False)
several
merged = event_users.merge(once, how = 'left', on = 'event_name')
merged.columns = ['event_name','count_all','count_once']
merged['prop_%'] = merged['count_once']/merged['count_all']*100
merged['twice_and_more_%'] = 100-merged['prop_%']
print('This is the proportion of user that performd the action at least once from total:\n\n{}'.format(merged[['event_name','prop_%','twice_and_more_%']]))
How much users made the all steps in their way to the paymabt screen?
actions = n_df.groupby(['user_id','event_name'])['timestamp'].count().reset_index()
actions =actions.pivot_table(index='user_id', columns='event_name',values='timestamp', aggfunc='count').reset_index()
actions.dropna(inplace = True)
print('There is ',actions.shape[0],'who made the all steps.\nThey make up {:.0%} of all users.'.format(actions.user_id.nunique(
)/n_df.user_id.nunique()))
We will now review the sequences that users performed on the way to payment.
For convenience we will define:
T = Tutorial
A = MainScreenAppear stage
B = OffersScreenAppear
C = CartScreenAppear
D = PaymentScreenSuccessful
We'll create table by users and the steps they performed:
seq = n_df.groupby(['user_id','event_name'])['timestamp'].count().reset_index()
seq =seq.pivot_table(index='user_id', columns='event_name',values='timestamp', aggfunc='count').reset_index()
seq = seq[['user_id','MainScreenAppear','OffersScreenAppear','CartScreenAppear','PaymentScreenSuccessful','Tutorial']]
seq.columns = ['user_id','A','B','C','D','T']
seq.sample(3)
We will now see the possible sequences and step-by-step conversion:
seq_A =seq.query('A==1')['user_id'].nunique()
seq_A_B =seq.query('A==1 & B==1')['user_id'].nunique()
seq_A_B_C =seq.query('A==1 & B==1 & C==1')['user_id'].nunique()
seq_A_B_C_D =seq.query('A==1 & B==1 & C==1 & D==1')['user_id'].nunique()
###
seq_A_C =seq.query('A==1 & C==1')['user_id'].nunique()
seq_A_C_D =seq.query('A==1 & C==1 & D==1')['user_id'].nunique()
###
seq_A_B_D = seq.query('A==1 & B==1 & D==1')['user_id'].nunique()
###
seq_A_D =seq.query('A==1 & D==1')['user_id'].nunique()
fig = go.Figure()
fig.update_layout(title="Funnels of Different Sequences")
fig.add_trace(go.Funnelarea(
scalegroup = "second", values = [seq_A, seq_A_B,seq_A_B_C,seq_A_B_C_D], textinfo = "value",
title = {"position": "top center", "text": "Seq: A->B->C->D"},
marker = {"colors": ["deepskyblue", "tan", "teal", "silver"]},
domain = {"x": [0, 0.5], "y": [0.55, 1]}))
fig.add_trace(go.Funnelarea(
scalegroup = "second", values = [seq_A, seq_A_B,seq_A_B_D], textinfo = "value",
title = {"position": "top center", "text": "Seq: A->B->D"},
marker = {"colors": ["deepskyblue", "tan", "silver"]},
domain = {"x": [0.55, 1], "y": [0.55, 1]}))
fig.add_trace(go.Funnelarea(
scalegroup = "first", values = [seq_A, seq_A_C,seq_A_C_D],textinfo = "value",
title = {"position": "top center", "text": "Seq: A->C->D"},
marker = {"colors": ["deepskyblue","teal", "silver"]},
domain = {"x": [0, 0.5], "y": [0,0.5]}))
fig.add_trace(go.Funnelarea(
scalegroup = "first", values = [seq_A, seq_A_D],textinfo = "value ",
title = {"position": "top center", "text": "Seq: A->D"},
marker = {"colors": ["deepskyblue","silver"]},
domain = {"x": [0.55, 1], "y": [0,0.5]}))
fig.update_layout(
margin = {"l": 200, "r": 200}, shapes = [
{"x0": 0, "x1": 0.5, "y0": 0, "y1": 0.5},
{"x0": 0, "x1": 0.5, "y0": 0.55, "y1": 1},
{"x0": 0.55, "x1": 1, "y0": 0, "y1": 0.5},
{"x0": 0.55, "x1": 1, "y0": 0.55, "y1": 1}])
fig.update_layout(legend=dict(title = "Event:\n 0=A,1=B,2=C,3=D"))
fig.show()
As We can see the funnels above, not all of them part of a single sequence.
The user can reach to the payment event in a different sequences.
names = ["A - Main Screen", "B - Offers", "C - Cart", "D - Payment"]
evnn = event_users.sort_values(by='user_id',ascending=False)
fig=go.Figure(go.Funnel(
y = names,
x = evnn.user_id,
textinfo = "value+percent previous",
marker = {"color": "teal"}))
fig.update_layout(title="Event Funnel: Total Users")
fig.show()
funnel_by_group = []
for i in n_df.experiment_id.unique():
group = n_df[(n_df.experiment_id == i) & (n_df.event_name != 'Tutorial')].groupby(
['event_name','experiment_id'])['user_id'].nunique().reset_index().sort_values(by = 'user_id',ascending = False)
funnel_by_group.append(group)
funnel_by_groups = pd.concat(funnel_by_group)
funnel_by_groups
fig = px.funnel(funnel_by_groups, x='user_id', y = 'event_name', color = 'experiment_id')
fig.update_traces(textinfo = "value+percent previous+percent initial")
fig.show()
funnel_shift = n_df.query('event_name!= "Tutorial"').groupby('event_name')['user_id'].nunique().sort_values(ascending =False).reset_index()
funnel_shift['perc_ch'] = funnel_shift['user_id'].pct_change()
funnel_shift
fig = px.bar(funnel_shift, x="event_name", y="perc_ch", color="event_name", title="% of lose users by event")
fig.show()
From the graph above, we can see that on reaching the second event from the first event we lose the most users (close to 40%)!
We will suggest developers check out the main landing page this is much more important than changing the font!
all_ev = n_df.groupby(['user_id','event_name'])['timestamp'].count().reset_index()
all_eve = all_ev.pivot_table(index='user_id', columns='event_name',values='timestamp', aggfunc='count').reset_index()
journy_all_events = all_eve.dropna().count()[0]
journy = n_df.query('event_name == "PaymentScreenSuccessful"').user_id.nunique()
journy
print('The number of users that made the entire journey is {} and their share from their first event is: {:.2%}.'.format(journy,journy/n_df.user_id.nunique()))
If so, at this point I would suggest to the dev department to offer an alternative main screen, on which we will perform the next A / B test.
At this stage we will perform an A / A test on the proportions between each event between the control groups, and if it turns out well we will perform an A / B test for each of the control groups together with the test group.
We can then conclude whether the proposed change yields positive results or is ineffective.
A_A_B = n_df.groupby(['experiment_id'])['user_id'].nunique().reset_index()
A_A_B
Before we perform the test, we must make sure that there are no users in more than one group-
n_df.groupby(['user_id'])['experiment_id'].nunique().reset_index().query('experiment_id>1')
We found that there are no duplications, we can continue!
We will now create a table that contains all the data on the number of users divided experiment group:
pivot = n_df.pivot_table(index='event_name', values='user_id', columns='experiment_id', aggfunc=lambda x: x.nunique()).reset_index()
pivot
It seems that the most popular event is the main screen event with the highest number of users each group.
We will now prepare the function that will perform the tests between 2 selected groups for every event:
def check_hypothesis(group1,group2, event, alpha=0.05):
print('Event name:',event)
print('--------------------------')
print('Hypothesis definision:')
print('--------------------------')
print('* The significant level (alpha) is: {:.1%}'.format(alpha))
print('* H0: The number of success in the given event: "{}" of the group B- {} equals to the group A- {}'.format(event,group1,group2))
print('* H1: The number of success in the given event: "{}" of the group B- {} is not equals to the group A- {}'.format(event,group1,group2))
print('==========================')
print('* Test Result: *')
print('==========================')
#let's start with successes, using
successes1=pivot[pivot.event_name==event][group1].iloc[0]
successes2=pivot[pivot.event_name==event][group2].iloc[0]
#for trials we can go back to original df or used a pre-aggregated data
trials1=n_df[n_df.experiment_id==group1]['user_id'].nunique()
trials2=n_df[n_df.experiment_id==group2]['user_id'].nunique()
#proportion for success in the first group
p1 = successes1/trials1
#proportion for success in the second group
p2 = successes2/trials2
# proportion in a combined dataset
p_combined = (successes1 + successes2) / (trials1 + trials2)
difference = p1 - p2
z_value = difference / math.sqrt(p_combined * (1 - p_combined) * (1/trials1 + 1/trials2))
distr = stats.norm(0, 1)
p_value = (1 - distr.cdf(abs(z_value))) * 2
print('p-value: ', p_value)
if (p_value < alpha):
print('Reject H0 for',event, 'and groups',group1,group2)
else:
print('Fail to Reject H0 for', event,'and groups',group1,group2)
print('**********************************************************************************************************\n')
We will now examine the events in order to check if there is statistically significant difference between the two control groups:
We'll take alpha to be 5%-
a = 1
for i in pivot.event_name.unique():
print('Test number',a)
check_hypothesis(246,247, i, alpha=0.05)
a= a+1
As per our findings above, our two control groups in the A/A test (samples 246 and 247) has no statistically significant difference between them.
Now We will examine the events in order to check if there is statistically significant difference between the two control and test group:
Initially I will perform with a significance level of 5%.
After I will perform with a significance level of 10%.
print('\t\t\t\t\t\tA=246, B=248\n')
a = 1
for i in pivot.event_name.unique():
print('Test number',a)
check_hypothesis(246,248, i, alpha=0.05)
a= a+1
It seems that with alpha = 5%, there is no significant difference between the groups.
print('\t\t\t\t\t\tA=247, B=248\n')
a = 1
for i in pivot.event_name.unique():
print('Test number',a)
check_hypothesis(247,248, i, alpha=0.05)
a= a+1
It seems that with alpha = 5%, there is no significant difference between the groups.
Now We'll take alpha to be 10%:
print('\t\t\t\t\t\tA=246, B=248\n')
a = 1
for i in pivot.event_name.unique():
print('Test number',a)
check_hypothesis(246,248, i, alpha=0.1)
a= a+1
We found that at a significance level of 10%, there is a significant difference in the "CartScreenAppear" event.
Is the rejection of the null hypothesis due to the fact that there really is a significant difference between the two groups in this event?
Is this difference due to the fact that maybe we had to correct the level of significance because we performed few tests?
We will answer this question in the next step-
Bonferrini: a = a/m | a= a, m=number of tests
a=0.1
m=4 (Without the tutorial event)
new a = 0.1/4=0.025
a = 1
for i in pivot.event_name.unique():
print('Test number',a)
check_hypothesis(246,248, i, alpha=0.10/4)
a= a+1
After a correction in Nafrini, we received that there is no significant difference between the groups, probably in a previous test without the correction we received a "False Positive" condition.
At this point we wanted to find out if the change that the development department wants to make will be effective on the step-by-step conversion rate and in general, will affect the overall conversion rate.
The findings are-
To summarize the tests-
It seems that the development did not yield results constitute a significant change for the company,
so I would recommend not adopting the change.
In the analysis process we performed the following:
Preparation for processing-
Learning the data-
Learning the funnel-
A \ B- Tests
After studying and analyzing, I will give the following recommendations: